Multi-processor device

ABSTRACT

The present invention intends to provide a high-performance multi-processor device in which independent buses and external bus interfaces are provided for each group of processors of different architectures, if a single chip includes a plurality of multi-processor groups. A multi-processor device of the present invention comprises a plurality of processors including first and second groups of processors of different architectures such as CPUs, SIMD type super-parallel processors, and DSPs, a first bus which is a CPU bus to which the first processor group is coupled, a second bus which is an internal peripheral bus to which the second processor group is coupled, independent of the first bus, a first external bus interface to which the first bus is coupled, and a second external bus interface to which the second bus is coupled, over a single semiconductor chip.

CROSS-REFERENCE TO RELATED APPLICATIONS

The disclosure of Japanese Patent Application No. 2007-11367 filed onJan. 22, 2007 including the specification, drawings and abstract isincorporated herein by reference in its entirety

BACKGROUND OF THE INVENTION

The present invention relates to optimal bus configurations and layoutsof components of a multi-processor device in which a plurality of groupsof processors are implemented in a single LSI.

In multi-processor devices in which multiple processors of the samearchitecture and multiple processors of different architectures such asCPU and DSP are implemented over a single semiconductor chip, busconfigurations as below have been used. In one configuration, allmultiple processors are coupled to a single bus, as described inNon-Patent Document 1 mentioned below. In another configuration, tocouple multiple processors using the same protocol to a bus, local busesare provided for each CPU and the local buses are coupled together witha bridge, as described in Non-Patent Document 2 mentioned below.

In the case where all multiple processors are coupled to a single bus,the processors are coupled to the same bus, whether the LSImulti-processor device is equipped with one external bus interface ormultiple external bus interfaces.

In the case where multiple local buses are coupled together with abridge, one processor is coupled to a local bus, the respective localbuses are coupled to a single bus master, and a single bus is coupled toan external bus interface.

[Non-Patent Document 1]

-   Toshiba, EmotionEngine, SCE/IBM/Toshiba, Cell, Feb. 9, 2005,    [searched on Jan. 9, 2007] Internet    <http://ascii24.com/news/i/tech/article/2005/02/09/654178-000.html>

[Non-Patent Document 2]

-   Renesas, G1, February 2006, ISSCC2006 FIG. 29.5.1 “A Power    Management Scheme Controlling 20 Power Domains for a Single-Chip    Mobile Processor”

SUMMARY OF THE INVENTION

However, if multiple processors including different architectures arecoupled to a single bus, as different-architecture processors generallydiffer in processing performance and speed, the following problem wasposed: the operation of high-speed processors is impaired by low-speedprocessors and the performance of high-speed processors is deteriorated.If the multi-processor device includes CPUs and processors that aremainly for data processing, such as DSPs and SIMD type super-parallelprocessors, due to that DSPs and SIMD type super-parallel processorshandle a large amount of data, the following problem was posed: the CPUshave to wait long before accessing the bus and the benefit of theenhanced performance of the multi-processor device is not availablewell.

With regard to a problem of coherency between caches, the coherency isensured for multiple processors of the same architecture, but the cachecoherency between different-architecture processors is not ensuredpractically and an inconsistency problem was presented.

If a multi-processor oriented OS is run, it is often enabled only forprocessors of the same architecture, as different-architectureprocessors are supplied by different developers and an OS designed forthese processors is hardly made. Therefore, separate OSs must beprovided for different-architecture processors. A situation whereprocessors on which different OSs are connecting to a single bus meansthat the processors are coupled to a bus master IP connection which isunknown to the OSs on the same bus. A problem was posed in whichenhanced performance such as scheduling of the multi-processor orientedOS is impaired.

Even when multiple local buses are coupled together with a bridge, therespective local buses are coupled to a single bus master and,therefore, a combination of a CPU and a local bus is considered as asingle CPU. This posed the same problem as the above problem with thesituation where different-architecture processors are connecting to thesame bus.

Due to that the processors are coupled to the same bus, whether the LSImulti-processor device is equipped with one external bus interface ormultiple external bus interfaces, the following problem was presented. Abus portion to which an external bus interface is coupled is blocked bya request for access to the external bus from another bus and cannotyield desired performance. A bus portion to which an external businterface is not coupled experiences performance deterioration whenaccess to the external bus interface from another bus occurs.

Therefore, the present invention has been made to solve the aboveproblems and intends to provide a high-performance multi-processordevice in which independent buses and external bus interfaces areprovided for each group of processors of different architectures.

In one embodiment of the present invention, a multi-processor devicecomprises, over a single semiconductor chip, a plurality of processorsincluding a first group of processors and a second group of processors,a first bus to which the first group of processors is coupled, a secondbus to which the second group of processors is coupled, a first externalbus interface to which the first bus is coupled, and a second externalbus interface to which the second bus is coupled.

According to one embodiment of the present invention, when a pluralityof groups of processors are implemented on a single semiconductor chip,independent buses and external bus interfaces are provided for eachgroup of processors of different architectures. By this configuration,each group of processors can operate independently and, therefore,coordination and bus contention between processors are reduced. It ispossible to realize at low cost a high-performance multi-processorsystem consuming low power.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing a configuration of a multi-processor deviceof Embodiment 1 of the present invention.

FIG. 2 is a layout view of components of the multi-processor device inEmbodiment 2 of the invention.

FIG. 3 is another layout view of the components of the multi-processordevice in Embodiment 2 of the invention.

FIG. 4 is yet another layout view of the components of themulti-processor device in Embodiment 2 of the invention.

FIG. 5 is a layout view of the components of the multi-processor devicein Embodiment 3 of the invention.

FIG. 6 is a diagram showing a configuration of a multi-processor deviceof Embodiment 4 of the invention.

FIG. 7 is a diagram showing a configuration of a multi-processor deviceof Embodiment 5 of the invention.

FIG. 8 is a timing chart in Embodiment 6 of the invention.

FIG. 9 is a diagram showing a clock supply circuit of prior art.

FIG. 10 is a diagram showing a clock supply circuit in Embodiment 6 ofthe invention.

FIG. 11 is a block diagram of software in Embodiment 7 of the invention.

FIG. 12 is a block diagram of software in Embodiment 7 of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1

FIG. 1 is a diagram showing a configuration of a multi-processor deviceof Embodiment 1 of the present invention. This multi-processor device isformed over a single semiconductor chip 1. Multiple processors, namely,CPUs CPU1 through CPU8 are arranged in parallel (a first group ofprocessors), making a Symmetric Multiple Processor (SMP) structure. EachCPU includes primary caches (I-cache, D-cache), a local memory (U-LM), amemory management unit (MMU), and a debugger (SDI). Eight CPUs arecoupled to a CPU bus 10 (a first bus) and the CPU bus 10 is coupled to asecondary cache 12 via a CPU bus controller 11. The secondary cache 12is coupled to an external bus 1 via a DDR2 I/F 13 (a first external businterface).

The CPUs operate internally at 533 MHz at maximum. The operatingfrequency of each CPU is converted by a bus interface inside the CPU, sothat the CPU is coupled to the CPU bus 10 at 266 MHz at maximum. Thesecondary cache 12 and the DDR2 I/F 13 operate at 266 MHz at maximum.

The LSI device of the present invention has an internal peripheral bus14 (a second bus) in addition to the CPU bus 10 on the samesemiconductor chip. To the internal peripheral bus 14, a peripheralcircuit 15 including ICU (interrupt controller), ITIM (interval timer),UART (Universal Asynchronous Receiver Transmitter: clock asynchronousserial I/O), CSIO (clock synchronous serial I/O), CLKC (clockcontroller), etc., a DMAC 16 (DMA controller), a built-in SRAM 17,SMP-structure matrix type super-parallel processors (SIMD typesuper-parallel processors 31, 32, a second group of processors), anexternal bus controller 18 (a second external bus interface), and a CPU19 of another architecture are coupled. The internal peripheral bus 14is coupled to an external bus 2 via the external bus controller 18,thereby forming an external bus access path for connection to externaldevices such as SDRAM, ROM, RAM, and IO.

The internal peripheral bus 14 operates at 133 MHz at maximum and theDMAC 16, built-in SRAM 17, and peripheral circuit 15 also operate at 133MHz at maximum. The SIMD type super-parallel processors operateinternally at 266 MHz at maximum. The operating frequency of eachsuper-parallel processor is converted by a bus interface inside it tocouple the processor to the internal peripheral bus 14. Likewise, theCPU 19 operates internally at 266 MHz at maximum and this operatingfrequency is converted by a bus interface inside it to couple it to theinternal peripheral bus 14. Because there is a difference in processingperformance and speed between the processor clusters, as describedabove, these processor clusters are controlled using separate clocks anddiffer in frequency and phase.

The CPU bus 10 and the internal peripheral bus 14 are coupled throughthe secondary cache 12. Therefore, the CPUs CPU1 through CPU8 not onlycan get access to the external bus 1 through the secondary cache 12 andvia the DDR2 I/F 13, but also can access resources on the internalperipheral bus 14 through the secondary cache 12. Thus, the CPUs CPU1through CPU8 can get access to another external bus 2 via the externalbus controller 18, though this path is long and the frequency of theinternal peripheral bus is lower thus resulting in lower performance ofdata transfer. The modules that are coupled to the internal peripheralbus 14 can get access to the external bus 2 via the external buscontroller 18, but cannot get access to the external bus 1.

The CPUs CPU1 through CPU8 are of the same architecture. For coherencybetween primary and secondary caches, the contents of the primary andsecondary caches are coherency controlled so as to be consistent andthere is no need to worry about malfunction of the CPUs. Even in a casewhere a multi-processor oriented OS is used, high performance can bedelivered, because eight CPUs of the same architecture and the secondarycache 12 are only connecting to the CPU bus 10 and the external bus 1 isaccessible from only the CPUs CPU1 through CPU8. Especially, the SIMDtype super-parallel processors operate at lower speed than the CPUs andhandle a large amount of data when they process data. Consequently,these processors are liable to occupy the bus for a long time. However,this does not affect the data transfer on the CPU bus 10, because theSIMD type super-parallel processors have access to the external bus 2through the internal peripheral bus 14.

From the viewpoint of the SIMD type super-parallel processors, the CPUsprimarily use the path of the external bus 1 from the CPU bus 10.Therefore, there is no need to release the internal peripheral bus 14for the CPUs during data transfer and efficient data transfer can beperformed. This effect is significant especially because of themulti-processor consisting of a plurality of CPUs. In this embodimentexample of the invention, there are eight CPUs in the multi-processordevice. However, in a case where 16, 32, or more processors share thesame bus with the SIMD type super-parallel processors oriented to dataprocessing, data processing latency occurs. If the present invention isapplied to such a case, its effect will be more significant.

The CPU 19 is a small microprocessor whose operating speed andprocessing performance are lower than the CPUs CPU1 through CPU8, but itconsumes smaller power and occupies a smaller area. This CPU can performoperations such as activating the peripheral circuit 15 and checking atimer, which do not require arithmetic processing performance such aspower management using CLKC. Therefore, even if the CPU 19 shares thesame bus with the SIMD type super-parallel processors, it does not posea problem in which the performance of the SIMD type super-parallelprocessors is deteriorated.

Embodiment 2

FIGS. 2 through 4 are layout views of components of the multi-processordevice in Embodiment 2 of the present invention. FIG. 2 illustrates anexample of layout in which the modules constituting the multi-processordevice of Embodiment 1 are actually arranged over a silicon wafer. FIG.3 presents the layout example of FIG. 2 in another view in which themodules associated to the CPU bus (CPUs CPU1 through CPU 8 and CPU buscontroller) are represented collectively as a CPU bus region 20 and themodules associated to the internal peripheral bus (SIMD typesuper-parallel processors 31, 32, CPU 19, built-in SRAM 17, peripheralcircuit 15, external bus controller 18, and DMAC 16) are representedcollectively as an internal peripheral bus region 21. FIG. 4 is a layoutview in which supply voltage/GND lines 2-2 are wired.

By laying out the components of the multi-processor device asillustrated in FIG. 2, the internal peripheral bus 14 and the CPU bus 10can be run across shortest distances as shown. This layout enableshigh-speed operation with less possibility of congestion due tocomplicated cross wiring and hence consumes smaller die area and is lesscostly. In the wiring, the number of crossing signal lines other thanthe buses decreases and speed down due to wiring congestion and longdistance wiring is not likely to occur. Hence, an LSI device with lowpower consumption can be realized at low cost. The device area isdivided into the bus regions that are easy to control for power shutdownand the like.

There is a difference in operating frequency and arithmetic processingcapability between the internal peripheral bus region 21 and the CPU busregion 20 and, consequently, these regions have different powerconsumptions. Low-impedance wiring is required in the CPU bus region 20with higher clock frequency and larger power consumption. Relativelyhigh impedance is allowable in the internal peripheral bus region 21with lower clock frequency and smaller power consumption. Low-impedancewiring in the region with larger power consumption can be implemented bywiring of wide lines or closely spaced wiring. As adverse effect ofthis, wired voltage supply/GND lines 22 occupy more area in the wiringlayer and wiring of other signal lines and the like is hard to do. As aresult, the LSI device area increases and cost increases, and additionalroundabout wiring of signal lines increases wiring capacity, which inturn increases power consumption. If these regions are scattering andmixed, low-impedance wiring has to be performed throughout the devicearea to ensure stable operation. However, this makes the device arealarger and the cost higher.

In the layout where the device area is divided into the CPU bus region20 with larger power consumption and the internal peripheral bus region21 with smaller power consumption, as shown in FIG. 3, it is solelyrequired to apply low-impedance wiring of voltage supply/GND lines 22only in the CPU bus region 20. For example, wiring can be performed suchthat wide lines are closely spaced in the CPU bus region 20 and narrowlines are sparsely spaced in the internal peripheral bus region 21, asshown in FIG. 4. By doing in this way, unnecessary wiring of voltagesupply lines is avoided and stable operation can be assured at low cost.Similarly, voltage supply terminals can be allocated such that voltagesupply/GND terminals 23 in the CPU bus region 20 are closely spaced andvoltage supply/GND terminals 23 in the internal peripheral bus region 21are sparsely spaced.

In FIG. 4, lines with widths drawn over each region are voltage supplyor GND lines and circles at outer edges of the chip are voltage supplyor GND terminals. Although a number of simulative, somewhat wide linesare drawn, a great number of extra-fine lines are wired actually. Forexample, in a manufacturing process for wiring of signal lines with aminimum width of 0.2 μm, 1 μm wide lines are wired at pitches of 4 μm inthe CPU bus region 20 and 0.4 μm wide lines are wired at pitches of 100μm in the internal peripheral bus region 21. This way of wiring enablesassuring stable operation, while avoiding unnecessary wiring of voltagesupply/GND lines 23. Since no external bus is coupled to the CPU busregion 20 from FIG. 1, even this region is provided with not so largenumber of terminals. Application of the layout of the present embodimentcan realize the multi-processor device in which adverse effects arereduced to an insignificant level.

In the present embodiment, the external bus 1 and the external bus 2 aredisposed apart from each other at the top and bottom edges of the chip.Because the external bus controller 18 or the DDR2 I/F 13 has highdriving capability, they consume large power and are prone to producepower-supply noise or the like. However, in the layout of the presentembodiment, the external bus controller 18, DDR2 I/F 13, and CPUs whichcarry large current are disposed apart from each other. Localconcentration of power does not take place and therefore heat generationis uniform throughout the chip. The external bus controller 18, DDR2I/F, and CPUs are sensitive to noise and temperature change. However, asthey are placed apart from each other, influence of noise and heatgeneration on each other is reduced.

By thus disposing the modules with larger power consumption, which aresensitive to noise, apart from each other, mutual noise interference isreduced. Hence, the multi-processor device can be designed with anestimate of a smaller margin for noise. Since power consumption isuniform throughout the device and there is no local power concentration,wiring of voltage supply lines can be simplified. Besides, there is nolocal heat generation and the device can be designed with an estimate ofa smaller margin for temperature change. Therefore, it is possible torealize at low cost the LSI device occupying a small area and consuminglow power, while assuring stable operation.

Embodiment 3

FIG. 5 is an example of layout of the modules of the multi-processordevice of Embodiment 1 configured on an actual silicon wafer. Incomparison with Embodiment 2, changes are the positional relationshipbetween the CPU bus controller module and the peripheral circuit module,the position and size of the built-in SRAM 17, and the shapes of the CPU19 and the secondary caches 12.

As regards the positional relationship between the CPU bus controllermodule, in most cases of layout using an automatic wiring tool, busesare wired between each CPU and the CPU bus controller module as shown inFIG. 5, not a straight bus wiring that divides the CPU region intoexactly two parts as shown in FIG. 2. In such cases, although some ofthe CPU buses slightly overlap with the internal peripheral bus 14,almost the same effect as in Embodiment 2 can be obtained. It may bepreferred to place the CPU bus controller module in the vicinity of thecentroid of the area compassing the CPUs and the secondary caches 12 asshown in FIG. 5. For example, the built-in SRAM 17 may be smaller thanthat provided in Embodiment 2 and, if the SRAM is infrequently accessedand its high operating speed is not required, its position may bechanged flexibly as shown in FIG. 5. This can make the overall devicearea smaller and the cost lower.

In bus wiring to the built-in SRAM 17, a buffer circuit 24 is placed ata branch point from the internal peripheral bus 14. Doing so can preventa decrease in the speed of the internal peripheral bus 14 and anincrease in its power consumption due to extended wiring of the internalperipheral bus 14. Insertion of the buffer circuit 24 poses no problem,because high-speed access to the built-in SRAM 17 is not required.

Embodiment 4

FIG. 6 is a diagram showing a configuration of a multi-processor deviceof Embodiment 4 of the present invention. Differences from Embodiment 1are described below. The CPU bus 10 and the internal peripheral bus 14are coupled through a bus bridge 25 circuit. Therefore, the CPUs CPU1through CPU8 not only can get access to the external bus 1 through thesecondary cache 12 and via the DDR2 I/F 13, but also can accessresources on the internal peripheral bus 14 through the bus bridge 25.Thus, the CPUs CPU1 through CPU8 can get access to another external bus2 via the external bus controller 18, though this path is long and thefrequency of the internal peripheral bus is lower, thus resulting inlower performance of data transfer. The modules which are coupled to theinternal peripheral bus 14 can get access to the external bus 2 via theexternal bus controller 18, but cannot get access to the external bus 1.However, data obtained by access to the external bus 2 and the internalperipheral bus 14 through the bus bridge 25 is excluded from caching inthe secondary cache 12. The modules that are coupled to the internalperipheral bus 14 also can get access to the external bus 2 via theexternal bus controller 18 and to the external bus 1 as well through thebus bridge.

The CPUs CPU1 through CPU8 are of the same architecture. For coherencybetween the primary and secondary caches, the contents of the primaryand secondary caches are coherency controlled so as to be consistent andthere is no need to worry about malfunction of the CPUs. Even in a casewhere a multi-processor oriented OS is used, high performance can bedelivered, because eight CPUs of the same architecture, the secondarycache 12, and the bus bridge 25 are only connecting to the CPU bus 10and the external bus 1 is mostly accessed from the CPUs CPU1 throughCPU8, but infrequently accessed from the modules coupled to the internalperipheral bus 14.

Other configuration details and effects are the same as for Embodiment 1and, therefore, description thereof is not repeated.

Embodiment 5

FIG. 7 is a diagram showing a configuration of a multi-processor deviceof Embodiment 5 of the present invention. Difference from Embodiment 1lies in that, instead of the SIMD type super-parallel processors 31, 32,DSPs 41, 42 are coupled to the internal peripheral bus. Although, inthis embodiment, the secondary cache 12 acts as a bridge between the CPUbus 10 and the internal peripheral bus 14, a dedicated bus bridge 25 maybe used as in Embodiment 4. Other configuration details and effects arethe same as for Embodiment 1 and, therefore, description thereof is notrepeated.

Embodiment 6

FIG. 8 is a timing chart representing relationship between the clock ofthe CPUs (CPU clock) in Embodiments 1 through 5 and the CPU bus clock(bus clock) Cases where the frequency of the CPU clock is higher thanthe frequency of the CPU bus clock are considered. In FIG. 8, the caseswhere CPU clock frequency and bus clock frequency are at ratios of 1:1,2:1, 4:1, 8:1 are shown as examples. Clocks divided by n (n=1, 2, 4, 8)are clocks obtained by dividing the frequency of the CPU clock accordingto the above ratios.

In the present invention, a bus clock which is presented in FIG. 8 isused as the clock of the CPU bus 10 (see FIG. 1), instead of a clockdivided by n. A clock supply circuit, when a clock divided by n is used,is shown in FIG. 9. A clock supply circuit, when Sync. and a bus clockare used, is shown in FIG. 10. Both a frequency divider in FIG. 9 and async. generator in FIG. 10 produce outputs from CLKC input thereto.Usually, there is only a single CLKC in LSI and, hence, a clock dividedby n or Sync. may be transmitted on a long path to some CPUs andactually a buffer or the like may be inserted.

When a clock divided by n and Sync. are compared, the number of times ofswitching (switching frequency) is the same for both, but the phase of aclock divided by n must be exactly aligned with the phase of the CPUclock, whereas this is not required for Sync. Therefore, using Sync.eliminates a need for an unnecessarily large buffer and a buffer forgenerating a delay which introduces inefficiency, thus making itpossible to realize at low cost the LSI device occupying a small areaand consuming low power.

As regards the quality of the clock of the CPU bus 10, in the case ofFIG. 9 where a clock divided by n is generated, a branch point from theCPU clock is far and the frequency divider is inserted. In the case ofFIG. 10 where a bus clock is generated, a branch point from the CPUclock is near and only an AND circuit is inserted. Therefore, in thelatter case, a phase difference (skew) with regard to the CPU clock canbe smaller and operation at a higher frequency is enabled. To facilitatetransfer between each CPU and the CPU bus 10, no or fewer buffers forensuring a hold are needed. Thus, it is possible to realize at low costthe LSI device occupying a small area and consuming low power.

While the relationship between the CPU clock and CPU bus clock wasexplained in the present embodiment, the same is true for therelationship between the clock of the SIMD type super-parallelprocessors and the clock of the internal peripheral bus 14 as well asthe relationship between the clock of the CPU 19 and the clock of theinternal peripheral bus 14.

Embodiment 7

FIG. 11 is a block diagram of software for a system using themulti-processor device according to any of Embodiments 1 through 6. Thesoftware structure includes device drivers (drivers) for each processorand OSs at a layer on top of the driver layer. OS1 is responsible forcontrol of the CPUs CPU1 through CPU8 and OS2 for control of the SIMDtype super-parallel processors 31, 32 and the CPU 19. It is conceivablethat one OS, for example, OS1 is non-realtime OS such as Linux and theother OS2 is realtime OS such as ITRON. OS1 is optimized for CPUarchitecture and eight CPUs of the same architecture, the secondarycache 12, and the bus bridge 25 are only connecting to the CPU bus 10.High performance can be delivered, because the external bus 1 is mostlyaccessed from the CPUs CPU1 through CPU8, but infrequently accessed fromthe modules coupled to the internal peripheral bus 14. The contents ofthe primary and secondary caches are coherency controlled by OS1 so asto be consistent and the coherency problem can be coped with optimally.Meanwhile, the OS2 side has the external bus 2 independently of OS1 and,therefore, there is almost no need for coordination for resources withOS1, and high performance can be delivered.

FIG. 12 shows another software structure including an additional OS3 forCPU 12. In addition to the effects described for FIG. 11, this softwarestructure is more efficient, as each OS is dedicated to governing theprocessors or processor of the same architecture.

1-18. (canceled)
 19. A semiconductor device comprising: a plurality of afirst type of processors disposed in a first layout region of a singlechip in plan view and being controlled using a first clock, the signalchip having a rectangular shape with four sides in plan view; aplurality of a second type of processors disposed separately from theplurality of the first type of processors in a second layout region ofthe single chip in plan view, each said second type of processor havinga different architecture from said first type of processors and beingcontrolled using a second clock which differs in frequency or phase fromthe first clock; a first bus provided over the single chip to which theplurality of the first type of processors are coupled; a second busprovided over the single chip to which the plurality of the second typeof processors are coupled; a first external bus interface provided overthe single chip, to which a first external bus provided external to thesingle chip is coupled; and a second external bus interface providedover the single chip, to which a second external bus provided externalto the single chip is coupled, wherein the plurality of the first typeof processors is configured to access the first external bus via thefirst bus and the first external bus interface, wherein the plurality ofthe second type of processors is configured to access the secondexternal bus via the second bus and the second external bus interface,and wherein, in plan view, the first external bus interface has aportion disposed adjacent to a first side of the four sides of thesingle chip, and the second external bus interface has a portiondisposed adjacent to a second side, different from the first side, ofthe four sides of the single chip.
 20. The semiconductor deviceaccording to claim 19, wherein the first and second sides of the singlechip are opposite each other.
 21. The semiconductor device according toclaim 20, wherein the second external bus interface has another portiondisposed adjacent to a third side different from the first and secondsides, of the four sides of the single chip.
 22. The semiconductordevice according to claim 19, wherein the plurality of the first type ofprocessors is configured to access the second external bus via the firstbus, the second bus, and the second external bus interface.
 23. Thesemiconductor device according to claim 19, wherein the plurality of thefirst type of processors has a symmetric multiple structure and theplurality of the second type of processors has a symmetric multiplestructure different from the plurality of the first type of processorsin architecture.
 24. The semiconductor device according to claim 19,wherein the first type of processor has a first Symmetric MultipleProcessor (SMP) structure, and the second type of processor has a secondSMP structure which is different from the first SMP structure.
 25. Thesemiconductor device according to claim 19, wherein the plurality of thefirst type of processors is controlled with a first clock and theplurality of the second type of processors is controlled with a secondclock different from the first clock in frequency or phase.